Inference of Multiscale Gaussian Graphical Model

Author
Affiliation
Do Edmond Sanou, Christophe Ambroise, Geneviève Robin
Published

August 25, 2022

Abstract

Gaussian Graphical Models (GGMs) are widely used for exploratory data analysis in various fields such as genomics, ecology, psychometry. In a high-dimensional setting, when the number of variables exceeds the number of observations by several orders of magnitude, the estimation of GGM is a difficult and unstable optimization problem. Clustering of variables or variable selection is often performed prior to GGM estimation. We propose a new method allowing to simultaneously infer a hierarchical clustering structure and the graphs describing the structure of independence at each level of the hierarchy. This method is based on solving a convex optimization problem combining a graphical lasso penalty with a fused type lasso penalty. Results on real and synthetic data are presented.

build status Creative Commons License

1 Introduction

1.1 About this document

This document provides a template based on the quarto system for contributions to Computo Computo Team (2021). We show how Python (Perez, Granger, and Hunter 2011) or R (R Core Team 2020) code can be included.

1.2 Advice for writting your manuscript

First make sure that you are able to build your manuscript as a regular notebook on your system. Then you can start configure the binder environment.

2 Formatting

This section covers basic formatting guidelines. Quarto is a versatile formatting system for authoring HTML based on markdown, integrating \LaTeX and various code block interpreted either via Jupyter or Knitr (and thus deal with Python, R and many other langages). It relies on the Pandoc Markdown markup language.

To render/compile a document, run quarto render. A document will be generated that includes both content as well as the output of any embedded code chunks within the document:

quarto render content.qmd # will render to html

2.1 Basic markdown formatting

Bold text or italic

  • This is a list
  • With more elements
  • It isn’t numbered.

But we can also do a numbered list

  1. This is my first item
  2. This is my second item
  3. This is my third item

2.2 Mathematics

2.2.1 Mathematical formulae

LaTeX code is natively supported1, which makes it possible to use mathematical formulae:

will render

f(x_1, \dots, x_n; \mu, \sigma^2) = \frac{1}{\sigma \sqrt{2\pi}} \exp{\left(- \frac{1}{2\sigma^2}\sum_{i=1}^n(x_i - \mu)^2\right)}

It is also posible to cross-reference an equation, see Equation 1:

\begin{aligned} D_{x_N} & = \frac12 \left[\begin{array}{cc} x_L^\top & x_N^\top \end{array}\right] \, \left[\begin{array}{cc} L_L & B \\ B^\top & L_N \end{array}\right] \, \left[\begin{array}{c} x_L \\ x_N \end{array}\right] \\ & = \frac12 (x_L^\top L_L x_L + 2 x_N^\top B^\top x_L + x_N^\top L_N x_N), \end{aligned} \tag{1}

2.2.2 Theorems and other amsthem-like environments

Quarto includes a nice support for theorems, with predefined prefix labels for theorems, lemmas, proposition, etc. see this page. Here is a simple example:

Theorem 1 (Strong law of large numbers) The sample average converges almost surely to the expected value:

\overline{X}_n\ \xrightarrow{\text{a.s.}}\ \mu \qquad\textrm{when}\ n \to \infty.

See Theorem 1.

2.3 Code

Quarto uses either Jupyter or knitr to render code chunks. This can be triggered in the yaml header, e.g., for Jupyter (should be installed on your computer) use

---
title: "My Document"
author "Jane Doe"
jupyter: python3
---

For knitr (R + knitr must be installed on your computer)

---
title: "My Document"
author "Jane Doe"
---

You can use Jupyter for Python code and more. And R + KnitR for if you want to mix R with Python (via the package reticulate Ushey, Allaire, and Tang (2020)).

2.3.1 R

R code (R Core Team 2020) chunks may be embedded as follows:

Show the code
x <- rnorm(10)

2.3.2 Python

---
title: "My Document"
author "Jane Doe"
jupyter: python3
---
Show the code
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()
ax.plot(np.arange(10))

2.4 Figures

Plots can be generated as follows:

Show the code
library("ggplot2")
p <- ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  geom_smooth()
p

Interactive plots may also be produced in the HTML output of the document:

Show the code
library("plotly")
ggplotly(p)

It is also possible to create figures from static images:

2.5 Tables

Tables (with label: @tbl-mylabel renders Table 1) can be generated with markdown as follows

Table 1: my table caption
Tables Are Cool
col 1 is left-aligned $1600
col 2 is centered $12
col 3 is right-aligned $1

Table can also be generated by some code, for instance with knitr here:

Show the code
knitr::kable(summary(cars), caption = "Table caption.")
Table caption.
speed dist
Min. : 4.0 Min. : 2.00
1st Qu.:12.0 1st Qu.: 26.00
Median :15.0 Median : 36.00
Mean :15.4 Mean : 42.98
3rd Qu.:19.0 3rd Qu.: 56.00
Max. :25.0 Max. :120.00

2.6 Algorithms

A solution to typeset pseudocode just like you would do with \LaTeX, yet with HTML output is to rely on the JavaScript peudocode.js. Your pseudocode is written inside a <pre> tag. You need to modify the file includes/pseudocode-footer.html so that the Id of the rendered element match the one in <pre id ="">. The result is as follows:

\begin{algorithm}
\caption{A simple Algorithm}
\begin{algorithmic}
\STATE \textbf{Data}: $\mathcal{X} = \{x_1, \dots, x_n\}$
\STATE optimization parameters: number of iterations $T$, learning rate $\eta$
\STATE \textbf{Result}: output $\mathcal{Y} = \{y_1, \dots, y_n\}$
\PROCEDURE{myproc}{$T$, $\eta$}
    \FOR{$t = 0$ \TO $T$}
        \STATE do something (and fast please)
    \ENDFOR
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}

2.7 Handling references

2.7.1 Bibliographic references

References are displayed as footnotes using BibTeX, e.g. [@computo] will be displayed as (Computo Team 2021), where computo is the bibtex key for this specific entry. The bibliographic information is automatically retrieved from the .bib file specified in the header of this document (here: references.bib).

2.7.2 Other cross-references

As already (partially) seen, Quarto includes a mecanism similar to the bibliographic references for sections, equations, theorems, figures, lists, etc. Have a look at this page.

For more information

Check our mock version of the t-SNE paper for a full and advanced example.

Session information

Show the code
sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.utf8  LC_CTYPE=French_France.utf8   
[3] LC_MONETARY=French_France.utf8 LC_NUMERIC=C                  
[5] LC_TIME=French_France.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] plotly_4.10.0 ggplot2_3.3.6

loaded via a namespace (and not attached):
 [1] reticulate_1.25   tidyselect_1.1.2  xfun_0.31         purrr_0.3.4      
 [5] splines_4.2.1     lattice_0.20-45   colorspace_2.0-3  vctrs_0.4.1      
 [9] generics_0.1.3    viridisLite_0.4.0 htmltools_0.5.3   yaml_2.3.5       
[13] mgcv_1.8-40       utf8_1.2.2        rlang_1.0.4       pillar_1.8.0     
[17] glue_1.6.2        withr_2.5.0       DBI_1.1.3         lifecycle_1.0.1  
[21] stringr_1.4.0     munsell_0.5.0     gtable_0.3.0      htmlwidgets_1.5.4
[25] evaluate_0.16     labeling_0.4.2    knitr_1.39        fastmap_1.1.0    
[29] crosstalk_1.2.0   fansi_1.0.3       highr_0.9         Rcpp_1.0.9       
[33] scales_1.2.0      jsonlite_1.8.0    farver_2.1.1      png_0.1-7        
[37] digest_0.6.29     stringi_1.7.8     dplyr_1.0.9       grid_4.2.1       
[41] cli_3.3.0         tools_4.2.1       magrittr_2.0.3    lazyeval_0.2.2   
[45] tibble_3.1.8      tidyr_1.2.0       pkgconfig_2.0.3   Matrix_1.4-1     
[49] data.table_1.14.2 assertthat_0.2.1  rmarkdown_2.14    httr_1.4.3       
[53] rstudioapi_0.13   R6_2.5.1          nlme_3.1-157      compiler_4.2.1   

References

Computo Team. 2021. “Computo: Reproducible Computational/Algorithmic Contributions in Statistics and Machine Learning.” Computo.
Perez, Fernando, Brian E Granger, and John D Hunter. 2011. “Python: An Ecosystem for Scientific Computing.” Computing in Science
& Engineering
13 (2): 13–21.
R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Ushey, Kevin, JJ Allaire, and Yuan Tang. 2020. Reticulate: Interface to Python. https://github.com/rstudio/reticulate.

Footnotes

  1. We use katex for this purpose.↩︎

Reuse

Citation

BibTeX citation:
@article{edmondsanou,christopheambroise,genevièverobin,
  author = {Do Edmond Sanou, Christophe Ambroise, Geneviève Robin},
  title = {Inference of {Multiscale} {Gaussian} {Graphical} {Model}},
  journal = {Computo},
  date = {},
  url = {https://github.com/desanou/multiscale_glasso},
  doi = {xxxx},
  langid = {en},
  abstract = {Gaussian Graphical Models (GGMs) are widely used for
    exploratory data analysis in various fields such as genomics,
    ecology, psychometry. In a high-dimensional setting, when the number
    of variables exceeds the number of observations by several orders of
    magnitude, the estimation of GGM is a difficult and unstable
    optimization problem. Clustering of variables or variable selection
    is often performed prior to GGM estimation. We propose a new method
    allowing to simultaneously infer a hierarchical clustering structure
    and the graphs describing the structure of independence at each
    level of the hierarchy. This method is based on solving a convex
    optimization problem combining a graphical lasso penalty with a
    fused type lasso penalty. Results on real and synthetic data are
    presented.}
}
For attribution, please cite this work as:
Do Edmond Sanou, Christophe Ambroise, Geneviève Robin. n.d. “Inference of Multiscale Gaussian Graphical Model.” Computo. https://doi.org/xxxx.